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ABSTRACT 

j  6-n  -  '• r  *  r  v-j,  C 

The  sequence  notation  suggested  in  -{-44-}  provides  a  tool 
for  the  clear  and  precise  specification  of  systolic  computa¬ 
tions.  Namely,  it  separates  the  static  and  dynamic  levels 
of  the  specification.  At  the  static  level,  the  topology  of 
the  network  and  the  function  of  each  cell  are  described  by  a 
system  of  causal  equations  on  sequences,  and  at  the  dynamic 
level,  the  data  flow  is  described  by  the  elements  of  the 
individual  sequences. 

this  paper ,  we;  descr ibe',a  method  for  the  transforma¬ 
tion  of  a  given  algorithm  into  a  system  of  causal  sequence 
equations/ input-output  description  which  specifies  a  sys¬ 
tolic  computation.  The  basic  idea  of  the  method  is  to  pack 
arrays  of  variables  along  one  or  more  dimensions  into 
sequences.  Doing  this,  however,  may  result  in  a  system  of 
equations  that  is  not  causal,  and  hence,  a  transformation  of 
indices  in  the  original  algorithm  may  be  essential  in  order 
to  guarantee  causality  (the  positive  increment  of  time) 

The  derivation  of  index  transformations  from  the  data 
dependence  vectors  of  an  algorithm  was  discussed  in  the 
literature.  However,  data  dependence  vectors  do  not  carry 
any  information  about  absolute  values  of  the  indices,  and 
hence,  allow  only  the  derivation  of  linear  transformations. 
In  order  to  overcome  this  problem,  we  suggest  a  method  for 
the  derivation  of  the  index  transformation  from 
<used,def ined>  pairs.  These  pairs  retain  information  about 
the  absolute  values  of  the  indices,  and  thus  allow  for  non 
linear  transformations. 

Although  the  model  of  [14]  allows  arbitrary  intercon¬ 
nections  in  systolic  networks,  our  design  technique  is  res¬ 
tricted  to  the  class  of  networks  in  which  the  interconnec¬ 
tion  pattern  may  be  non-linear  only  along  specific  direc- 
tions.  Ring- like  networks  are  elements  A<ree&fcni Core  lass 77 

NTIS  CRA&I  3 

DTIC  TAB  □ 

U  announced  □ 

Justification 

By  . . 

D,  t  iu  iio-'i  / 

Availability  Codes 
,  Avc.il  and/or 


In  the  past  few  years,  many  formal  techniques  have  been  sug¬ 
gested  for  the  design  of  VLSI  computations,  in  general,  and  of 
systolic  computations,  in  particular-  These  techniques  include 
the  systematic  mapping  of  .wavefront-like  computations  into 
hardware  (e.g.  [5,7]),  the  derivation  of  alternative  systolic 
networks  from  a  given,  provably  correct,  design  (e.g.  [8,9,10]), 
and  the  reindexing  of  the  variables  in  a  given  algorithm  such 
that  the  dependence  between  the  variables  suit  VLSI  implementa¬ 
tion.  This  latter  technique  was  first  suggested  by  Kuhn  [6], 
and  later  studied  carefully  by  Moldovan  et  al.  [16],  Miranker  et 
al.  [15],  and  Quinton  et  al  [17],  Cappello  et  al.  [1]  also  con¬ 
ceived  this  reindexing  from  a  geometric  point  of  view  and  Ip3en 
et  al.  [3]  extended  the  idea  to  include  the  data  dependence 
between  coupled  systems.  Other  techniques  was  also  suggested  for 
the  search  of  an  optimal  systolic  network  in  a  restricted  class 
of  networks  [11],  and  for  the  mapping  of  an  acyclic  program  graph 
into  a  linear  array  [18]. 

Of  the  above  techniques,  re  indexing  seems  to  be  the  most 
promising  and  general  one  for  mapping  a  given  computation  into  a 
systolic  implementation.  It  is  described  briefly  as  follows: 
First,  the  computation  is  written  in  the  form  of  an  algorithm 
consisting  of  nested  loops  or  recurrence  formulas.  Each  variable 
in  the  algorithm  should  be  an  element  of  an  n+1  dimensional 
array,  for  some  n  *  J,  and  hence  may  be  associated  with  a  posi¬ 
tion  in  an  n+1  dimensional  space  that  we  call  here  the  "computa- 


tion  space".  In  this  space,  the  "Dependence  Vector"  of  a  data 
item  may  be  defined  as  the  vector  joining  the  positions  at  which 
the  item  is  defined  and  used.  One  of  the  dimensions  in  the  com¬ 
putation  space  is  chosen  to  represent  the  "Time",  and  a  specific 
space  transformation  is  derived  such  that  all  the  dependence  vec¬ 
tors  are  mapped  into  new  vectors  that  have  positive  components 
along  the  time  dimension.  The  interconnection  pattern  of  a  net¬ 
work  that  may  implement  the  given  computation,  and  the  speed  of 
the  data  movement  in  the  network  are  then  determined  by  the  com¬ 
ponents  of  the  transformed  dependence  vectors. 

The  derivation  of  the  space  transf ormation  from  the  depen¬ 
dence  vector  excludes  any  transformation  that  depends  on  the 
absolute  position  of  the  data  in  the  computation  space  (called 
nonlinear  transf ormations  in  [16]).  In  order  to  overcome  this 
deficiency,  Chen  [2]  suggested  a  technique  in  which  the  space 
transformation  is  accomplished  through  a  point  by  point  mapping. 
In  addition,  the  Chen  technique  carries  along  the  entire  algo¬ 
rithm  (first  order  recursive  equations)  during  the  design  pro¬ 
cess,  yielding  a  precise  and  complete  specification  of  the  sys¬ 
tolic  computation.  This  is  a  clear  advantage  over  the  previous 
reindexing  techniques,  where  the  specification  of  each  cell  and 
the  description  of  the  input  have  to  be  sought  separately  through 
a  repeated  application  of  the  linear  transformation  to  different 
points  in  the  computation  space. 

In  this  paper,  we  present  a  technique  that  is  based  on  the 
formal  model  of  [14].  It  is  a  reindexing  technique  in  which  the 


space  transformation  is  derived  from  <def ined,used>  pairs  of  the 
data  items  instead  of  the  dependence  vectors.  This  allows 
transformations  that  are  position  dependent  (non  linear)  and  yet 
avoids  the  point  by  point  mapping  of  the  space. 

As  in  [2],  our  technique  carries  along  the  entire  descrip¬ 
tion  of  the  computation  during  the  design  process.  More  specifi¬ 
cally,  given  a  canonical  algorithm,  where  each  data  item  is  asso¬ 
ciated  with  a  position  in  the  computation  space,  the  data  items 
along  the  "time"  dimens ion(s)  are  compacted  into  data  sequences. 
A  sequence  transf ormation  is  then  applied  to  enforce  "causality", 
a  condition  that  ensures  the  positive  increment  of  time.  The 
resulting  system  of  causal  equations  specifies  precisely  the 
topology  of  the  network,  as  well  as  the  operation  of  each  cell 
and  the  description  of  the  appropriate  inputs. 

The  formal  model  [14]  that  supports  our  technique  does  not 
put  any  restriction  on  the  topology  of  systolic  networks.  This 
allows  the  derivation  of  a  wide  range  of  systolic  computations 
that  may  not  be  derived  by  any  technique  that  imposes  the  condi¬ 
tion  of  local  communications  at  the  algorithmic  level  (e.g.  [2]). 
For  example,  the  shortest  path  multistage  network,  derived  in 
Section  6,  may  only  be  implemented  on  a  network  with  global  feed 
back.  That  is  a  ring-like  architecture 

Another  advantage  of  the  technique  presented  in  this  paper 
is  the  natural  translation  of  multi-time  dimensions  into  multis¬ 
tage  networks.  For  example,  if  two  dimensions  of  the  design 
space  are  associated  with  time,  then  data  items  along  these  two 


dimensions  may  be  easily  packed  into  data  sequences.  The  result¬ 
ing  system  of  equations  then  describes  a  multistage  network  where 
a  coarse  clock  determines  the  beginning  and  end  of  each  phase, 
and  a  fine  clocks  determines  the  cycles  within  each  phase. 

In  the  next  section,  we  introduce  systems  of  Causal  Canoni¬ 
cal  Sequence  equations  CCS,  and  we  show  that  any  CCS  specifies  a 
systolic  computation.  In  the  following  three  sections,  we 
describe  the  different  steps  involved  in  the  transformation  of  a 
given  algorithm  into  a  CCS.  These  steps  are  illustrated  by  an 
example  of  a  computation  for  the  solution  of  banded,  triangular 
linear  systems.  The  multistage  network  derived  in  Section  6 
shows  the  capability  of  the  technique  to  handle  multi-time  dimen¬ 
sions  and  global  feed-back  loops,  and  the  dynamic  programming 
network  of  Section  7  is  an  example  where  non-linear  sequence 
transformation  may  be  applied. 


2-  Canonic  Systems  Causal 


A  systolic  network  is  defined  in  [14]  to  be  a  network  of 
cells  (computational  and  I/O)  where  each  communication  link  is 
unidirectional  and  each  computational  cell  repeats  indefinitely 
the  execution  of  a  specific  cycle  of  the  form:  1)  Read  data  from 
the  input  links,  2)  perform  a  specific  computation,  and  3)  write 
the  results  on  the  output  links.  The  initiation  of  the  cycles  in 
the  different  cells  is  synchronized  by  a  global  clock. 

With  this  definition,  any  computation  on  a  given  systolic 
network  N  may  be  precisely  specified  as  follows: 

1)  Assign  to  each  cell  in  N  a  unique  label  £  e  In,  where  In  is 
the  set  of  n-tuples  of  integers.  If  N  is  a  linear  or  a  two 
dimensional  array,  then  the  usual  choice  of  n  is  1  and  2,  respec¬ 
tively. 

2)  Identify  each  link  in  N  by  a  pair  <y,i>  (written  as  y ,  where 
i  is  the  label  of  the  cell  at  which  the  link  terminates  and  y  is 
a  color  assigned  to  the  link.  The  only  restriction  on  link 
colors  is  that  links  terminating  at  the  same  cell  should  have 
different  colors.  In  this  paper,  links  that  are  directed  from  a 
cell  to  itself  will  be  allowed.  This  type  of  direct  feed  back  may 
be  used  to  store  information  from  one  cycle  to  the  next,  and  thus 
models  an  internal  register  in  the  cell. 

3)  Associate  with  each  link  y^  a  data  sequence  77^  (77  is  the  greek 
letter  corresponding  to  y)  .  The  i*'*1  element  of  77^,  namely  77^(1), 
is  the  data  item  that  appears  on  y^  at  the  beginning  of  cycle  i. 
A  special  item  'O'  is  used  to  indicate  a  "don't  know"  or  a 
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"don't  care"  element. 

4)  For  each  computational  cell  v  in  N,  specify  the  operation  of  v 
by  a  set  Ev  that  contains  one  sequence  equation  for  each  output 
link  of  v.  More  specifically,  if  yu  is  an  output  link  of  v,  then 
include  in  Ey  an  equation  of  the  form 

=  rv(av' Ay'T'v'  -  -  - )  (1) 

where  av,  by,  cv,...,  are  input  links  to  v,  and  r“  is  a  causal 
sequence  operator  that  specifies,  for  any  time  t,  the  output  item 
77u(t)  in  terms  of  the  previous  input  items  av(r),  0  (t) , . . . , 
t  <  t.  Many  sequence  operators  are  defined  in  [12]  and  [14].  In 
the  appendix,  we  define  the  few  operators  that  will  be  used  in 
the  examples  of  this  paper . 

5)  Specify  the  elements  of  the  sequences  associated  with  the 
input  links  of  the  network  (  the  output  links  of  input  cells). 

6)  Identify  the  output  data  items. 


t 

>• 

«S 


The  system  of  equations  obtained  in  4,  in  addition  to  the 
input  and  output  specifications  described  in  5  and  6,  respec¬ 
tively,  specify  completely  the  systolic  computation.  It  may  be 
easily  seen  that  this  system  of  equations/ input-output  specifica¬ 
tions  satisfies  the  following  conditions: 

CS1 :  Each  sequence  in  the  system  is  indexed  by  a  label  l  e  In, 
for  a  fixed  n.  Moreover,  all  the  sequences  that  appear  in 
the  right  side  of  any  specific  equation  are  indexed  by  the 
same  label.  (v  in  equation  (1)). 


/V 

;> 


o 


CS2 :  The  system  is  well  defined  and  consistent.  In  other  words, 


any  sequence  that  appears  in  the  system  is  defined  exactly 
once,  either  in  the  input  specification  or  as  the  left  side 
of  a  sequence  equation. 

Definition  1;  An  equation  of  the  form  (1),  and  the  associated 
operator  are  called  causal  if,  for  any  t,  t^l,  7?u(t)  does  not 
depend  on  any  element  av(r),  /5^{t)  ,  . ..,  for  some  r  ^  t.  □ 

Definition  2:  A  system  of  equations/ input-output  specifications 
is  called  canonic  if  it  satisfies  the  above  two  conditions.  If, 
in  addition,  each  sequence  equation  in  the  system  i3  causal,  then 
the  system  is  called  a  causal  canonic  system,  denoted  from  now  on 
by  CCS .  Q 

Proposition  1;  Any  CCS  specifies  a  systolic  computation. 

Proof:  We  will  obtain  the  systolic  computation  specified  by  the 
given  CCS  by  constructing  the  underlying  systolic  network  N  a3 
follows : 

Let  L  be  the  set  that  contains  all  the  indices  of  the  sequences 
that  appear  in  the  CCS  and  construct  for  each  index  v  e  L  a  cell 
labeled  v.  Partition  the  equations  in  CCS  into  mutually 
exclusive  sets  of  equations,  where  each  set  Ev  contains  the  equa¬ 
tions  whose  right  side  sequences  are  indexed  by  the  index  v. 
Now,  consider  each  set  Ev;  By  CS1,  each  equation  in  Ev  has  the 
form  (1).  For  each  such  equation,  construct  a  link  directed  from 
cell  v  to  cell  u.  Finally,  for  each  sequence  7?^  specified  in  the 
input  specification  part  of  the  CCS,  construct  an  input  cell  and 
a  link  directed  from  that  cell  to  cell  i.  The  label  of  the  input 
cell  may  be  assigned  arbitrarily. 
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Given  the  topology  of  N,  the  operation  of  each  cell  v  in  N 
is  then  described  by  the  equations  in  Ev-  Condition  CS2  guaran¬ 
tees  that  the  input  to  each  cell  is  an  output  of  a  cell  in  N 
(possibly  an  input  cell),  and  that  the  output  of  each  cell  is 
uniquely  defined.  Q 

Example:  Let  A={a..  _.;i=l,...n,  j=i-m,  . . .  ,  i)  be  a  band  lower  tri- 
-  i  >  j 

angular  matrix,  and  let  the  vectors  b={bi; i*l, . . .n}  and 
x={xi; i=l, . . . ,n)  satisfy  Ax=b  (see  ALG1  in  Sec.  3).  Consider  the 
following  CCS: 

INPUT  {  y^  «  nJ  0  y  ,  j-l,...,m+l  ;  *m+1  =  nm+1  0  /?m+1  ;  ^  -  i ; 
where  for  t*l,...,n, 

'yj(t)  =  at, t+j-m-1'  *n»l(t>  =  bt'  and  t(t)  =  0 
+  1  =-  O  [fj  +  -Xj  *  Cj  ]  j= 1, . . . ,m  (2. a) 

-  f*jj  n_;i+1  j-1, . . . ,m  (2.b) 

S-i  “  no  n”j+1  [[*j  ~  *j]  /  * j  J  j=m+L  (2.c) 

OUTPUT {  x.  =  Cm(m^l+2i)  ;  i-1, . . . ,n  ). 


.  W* 

'0 
(*.  i 


J¥ 


For  the  above  CCS  we  have  L*{1, . . . ,m+l} ,  “  {  equ ’ s 
(2.a/b)  }  for  i=l,...,m,  and  Em+^  =  {  equ  (2.c)  }.  The 
corresponding  network  is  shown  in  Fig.  1,  where  input/output 
cells  are  omitted.  Note  that  the  terms  n  ^  +  in  (2.b/c)  indi¬ 
cate  that  the  first  j  elements  on  the  link  should  be  forced 
to  zero.  This  is  important  because  it  saves  the  values  on  the  x 
Jinks  from  destruction  due  to  operations  involving  don't  cares. 
By  (2.a/b),  cells  l,...,m  are  multiply/add  cells  and  cell  m+1  is 
a  subtract/divide  cell.  By  the  definition  of  the  elements  of 


the  (j-m-1) 


sub  diagonal  of  A  are  supplied  on  the  link  c. 


starting  at  time  j+1  and  separated  from  each  other  by  one  time 


unit.  The  inputs  on  bm+^  and  x^,  as  well  as  the  outputs  on 


m 


are  also  specified  precisely  in  the  CCS.  Q 


Cm+1 


Fig.  1  -  A  network  for  forward  substitution. 


Hence,  by  Proposition  1,  the  task  of  designing  a  systolic 
computation  for  a  given  algorithm  is  reduced  to  that  of  deriving 
a  CCS  equivalent  to  the  algorithm.  This  derivation  may  be  accom¬ 
plished  by  first  transforming  the  algorithm  into  a  canonic  form, 
then  rewriting  the  canonic  algorithm  in  the  form  of  a  canonic 
system  of  sequence  equations/ input-output  specifications.  If  the 
system  is  not  causal,  then  a  sequence  transformation  may  be 
applied  to  enforce  causality  and  obtain  a  CCS  that  specifies  a 
systolic  computation.  If  more  than  one  sequence  transformation 


is  possible,  then,  the  one  that  reduces  the  execution  time  of  the 
computation  should  be  identified  and  chosen.  In  the  next  three 
sections,  we  explain  each  of  the  above  steps  in  details. 


i-  Canonic 


Kuhn  [6],  defines  a  naive  algorithm  as  one  that  is  written 
without  regard  to  possible  VLSI  implementations.  In  order  to 
design  a  systolic  computation  for  a  naive  algorithm,  we  start  by 
rewriting  the  algorithm  in  a  caninic  form: 

Definition  3:  A  canonic  algorithm  is  composed  of  an  input  state¬ 
ment  (equivalent  to  a  read  statement),  a  body,  and  an  output 
statement  (equivalent  to  a  write  statement).  The  body  of  the 
algorithm  is  constructed  from  arbitrary  nested  DO  loops  that 
enclose  assignment,  or  conditional  assignment  statements,  where 
the  latter  is  of  the  form  "IP  predicate  THEN  assignment".  The 
following  conditions  should  also  be  satisfied: 

CA1:  Each  variable  is  an  element  of  an  n+ 1-dimensional  array  for 
some  fixed  n,  n  *  1,  and  each  assignment  statement  is  exe¬ 
cuted  in  the  context  of  n+1  nested  loops.  Moreover,  if  S  is 
an  assignment  statement  that  is  executed  at  some  instance 
if  . ..,in+^  of  the  n+1  loop  indices,  then  each  variable  in 
the  right  side  of  S  should  be  the  ( i^, . . . , in+^)^h  entry  of 
an  array. 

CA2:  The  value  of  each  variable  should  be  defined  exactly  once 
before  it  is  used  (via  either  an  input  statement  or  an 
assignment  statement) . 

CA3:  If  S  is  an  assignment  statement  that  is  executed  at  some 
instance  i^,...,in+^  the  1°°P  indices,  and  the  vari¬ 

able  in  the  left  side  of  s  is  the  (i ,,..., j  _)t'h  entry  of 


solution  of  the  linear  system  A  x  =»  b,  where  A  -  {a.  , }  is  an  nxn 

1  •  j 

lower  triangular,  banded  matrix,  with  band -width  m+1,  and 

b  »  {b^}  is  an  n-dimensional  vector.  In  order  to  avoid  loop 

bounds  of  the  form  max{l,i-m},  we  assume  that  a.  .  =*  0  for  i^m, 

1  •  J 

j=i-m, . . . , 0 . 


ALG1:  Naive  forward  substitution. 


INPUT{  xt  =  0, 

ai,i  ■  bi 
DO  i*l,n 

{  DO  j-l,m 


i=l-m, . . . , n  ; 
l  —  1 , . . . , n ,  j  -i  —  m , 


-i  }  ; 


xi  '  xi  +  ai, i+j-m-1  *  xi,j-m-l  ; 


X1  '  <bl  -  xi>  /  ai,t  >! 


OUTPUT  {  X,, 


i-l, ... ,n  }. 


First,  we  rewrite  the  algorithm  such  that  each  statement  is 
nested  within  two  loops,  and  each  variable  is  an  element  in  a  two 
dimensional  array. 

I NPUT {  x(i,m+2)-0,  i-l-m,...,0  ;  x(i,l)^0,  i-l,...,n  ; 

a(i,j)  -  a.  b(i,m+l)  -  b. ,  ijl,...,n,  j*i-m,...,i  }  ; 

i ,  j  i 

DO  i-l,n 
DO  j-l,m+l 

{  IF  j^m  THEN  x(i,j+l)  -  x(i,j)  +  a(i,i+j-m-l)  *  x( i+j-m-l,m+2) ; 
IF  j-m+1  THEN  x(ifm+2)  -  (  b(i,j)  -  x(i,j)  )  /  a(i,i)  }  ; 
OUTPUT {  x.  -  x( i , m+2) ,  i*l,...,n  }. 


Now,  In  order  to  satisfy  CA1,  we  define  the  new  variables 


c(i,j)  -  a(i,i+j-m-l)  and  2(1,3)  -  x( i+j-m-l,m+2) .  The  first 
substitution  is  trivial,  however,  the  second  is  an  expansion  of 
the  column  (x(k,m+2)  ;  k=l-m, . . . ,n)  into  a  two  dimensional  array 
z.  Because  the  indices  i  and  j  are  added  in  x(  i+j-m-l,m+2) , 
then,  with  the  appropriate  initial  assignment,  z  may  be  expanded 
by  using  either  z(i-l,j+l)  -  z(i,j)  or  z(i+l,j-l)  J  z(i,j).  It 
may  be  shown  that  the  first  expansion  leads  to  an  algorithm  where 
data  are  used  before  they  are  defined,  thus  violating  CA2 . 
Hence,  we  pursue  the  second  expansion  which  is  sketched  in  Fig  2. 
More  precisely 


Z(l, j) 

3  x(j-m,m+2) 

z( i+l,m) 

=  x(i,m+2) 

1  1  f  •  •  •  f  n 

z(i+l, j-1) 

s  z(i, j) 

i"lf • • *  rH/ 

3 1 ,  •  •  •  ,m 


Fig  2  -  expansion  of  a  vector  into  a  matrix 

The  incorporation  of  this  expansion  into  the  above  algorithm 
gives  the  following: 


ALG2 :  Canonic  forward  substitution. 


INPtJT{  zCl/jJ^O,  ;  x(i,l)=0,  i*lf...,n; 

c( i , j ) aa^ ^ i+j-m-i'  b( i»m+l)=bi,  i-l,...,n,  j=l,...,nH 
DO  i-l,n 
DO  j«l,m+l 

{  IP  j  <  in  THEN  {  x( i , j+1)  =  x(i,j)  +  c(i,j)  z(i,j)  ; 

z(i+l,j-l)  =•  z  (  i ,  j  )  } ; 

IF  j  3  m+1  THEN  z(i+l,m)  -  (  b(i,j)  -  x(i,j)  )  /  c(i,j)  } 
OUTPUT {  x.  -  z(i+l,m),  i-l,...,n  }. 


•Jr 


Conditions  CA1  and  CA3  of  canonic  algorithms  establish  a  one 
to  one  correspondence  between  the  loop  indices  (i^,...,i  and 
the  dimensions  of  the  arrays  used  in  the  algorithm.  In  other 
words,  a  given  loop  index  i^,  may  be  used  in  the  algorithm  to 
select  elements  of  arrays  only  along  the  k^*1  dimension.  Hence, 
we  may  chose  one  loop  index  to  represent  the  time,  and  project 
the  variable  arrays  along  the  corresponding  dimension  by  packing 
each  n+1  dimensional  array  into  an  n  dimensional  sequence  array. 
This  transforms  an  algorithm  which  satisfies  CA1  and  CA2  into  a 
system  of  sequence  equations  which  satisfies  CS1  and  CS2.  That  i3 
a  canonic  system  of  sequence  equations. 

For  example,  if  we  chose  i  to  represent  the  ‘time1  in  AL£2, 
then  we  may  define  the  sequences  f ^ ^ ,  y ^ ,  and  as  follows 


£^(i)  -  x(i,j) 


7j(i)  =  c(i, j) , 


<^(i)  -  z  (  i ,  j  ) 


and  rewrite  ALG2  in  the  following  form: 


INPUT {  <  (1)  =  0,  j»l,  .  .  ,m,  ;  ^(i)  -  0,  i-1,  .  .  .  ,n  ; 


yj(i)  ‘ 
Vi*1’  -  bi 


i-1, ... ,n  ,  j  1, . . . , m+ 1 
i=  1,  . . .  ,n  }  ? 


(3. a) 
( 3  .  b) 


IX)  i«l,n 

DO  j«l,m+l 


{  IF  j  <  in  THEN  {  *j+1<i)  =  ^(i)  +  [^(i)  *  C  ^  ( i )  3  ; 

Cj^Ci  +  l)  -  C.(i)  }  ; 

IF  j  =  m+1  THEN  <m(i+l)  =  [*  <i)  -  ^(i)]  /  7j(i)  )  ; 


mm 


mmm 


OUTPUT  {  x.  =  <m(i+l)  ,  i=»l,  ...,n  }. 


The  above  algorithm  uniquely  defines  those  elements  of 
7^  ,  and  that  are  used  in  the  algorithm.  However,  by 

CA2,  any  element  of  a  sequence  that  is  not  defined  in  the  algo¬ 
rithm  is  not  used,  and  hence  may  be  set  to  the  don't  care  element 
0  or  assigned  an  arbitrary  value.  For  example,  j=l,...,m-l, 

are  defined  by  £..(1)  =  0  and  Cj _ ( i+1 )  -  <j(i),  i=l,...,n.  Given 
that  Cj_^(i+1)  is  not  defined  in  the  algorithm  for  i  >  m,  we  may 
compact  the  definitions  of  4j_^(i)  form  of  the  sequence 

equation  *  rig  Repeating  this  for  all  the  sequences 

gives  the  following  canonic  system  of  sequence  equations: 


I NPUT {  -  l,  where  i(t)=*0  for  any  t  ; 


y  j ,  j=l,...,m+l  and  as  in  (3.a/b)  }  ; 


Vl  '  {5  +  y)  *  <3 


Vl  =  "o 


Vl  '  n0  t  ] 


j=l, . . . ,m 


j- 1 ,  •  •  .  ,  m 


j  -m+1 


(4. a) 


( 4 .  b) 


(4.c) 


OUTPUT {  x.  =  <m(i  +  D  r  i  «!,...  ,n  }. 


The  conditional  assignment  statements  in  ALG2  does  depend  on 
j ,  and  hence  the  resulting  system  of  equations  contains  different 
equations  for  different  values  of  j.  On  the  other  hand,  if  the 
conditional  assignment  statement  in  the  canonic  algorithm  depends 
on  the  index  chosen  to  represent  the  'time'  then  the  multiplexing 
operator  has  to  be  used  in  order  to  express  the  algorithm  in 


sequence  form.  For  example,  if  the  index  j  in  ALG2  is  chosen  to 
represent  the  'time',  and  the  sequences  l 4^,  y ^  and  are 
defined  for  i-=l,...,n  by 


4t(j)  =  x(i,j),  <t(j)  =  z(i,j) 


•xi(j)  -  c(i,j) 


\(j)  =  { 


b ( i , m+ 1 ) 


if  j-1 
if  j>l 


(5. a) 


(5.b) 


Then,  from  ALG2,  Ci+1(j~l)  is  equal  to  Ci(j)  if  j  ^  m  and  to 
[0A(  j)-€j(j)  ]  />  j  (3  )  »  ie  j=m+l.  Adding  to  this  <i+1(j-l)*0  for 
j>m+l  (not  defined  by  the  algorithm),  we  get 

<i+1  -  n"1  Mm'1'“(Ci  ,  '  0*> 

Similarly,  we  may  define  and  derive  the  following  canonic 

* 

system  of  equations  in  which  the  sequence  5  is  defined  by 
* 

6  (t)“0  for  any  t*l: 


INPUT{  and  0^,  i  n,  as  in  (5.a/b)  }  ; 


nQ  m”'-  (€i+t*yi*ciJ  ,  ® * ) 


i  -1 , . .  .  ,  n 


i *1, . . .  ,n 


(6. a) 


(6  .b) 


ii+1  -  n'1  Mm'1'“’(ct  ,  t^r«i]/>i  »  0*) 

OUTPUT {  xt  =  <i  +  1(m)  r  i  ,n  }. 


The  main  difference  between  an  algorithm  and  a  system  of 
sequence  equations  is  that  some  order  of  evaluation  is  imposed  in 
the  algorithm,  while  no  order  is  imposed  on  the  evaluation  of  the 
elements  of  the  sequences  in  a  system  of  sequence  equations. 
However,  when  a  CCS  is  evaluated  in  a  systolic  network,  the  order 


of  evaluation  is  such  that  the  t  elements  of  all  the  sequences 
in  the  system  are  evaluated  simultaneously,  and  the  evaluation 
proceeds  in  the  order  t=l,2,....  We  call  this  order  an  element¬ 
wise  evaluation. 

Given  that  variables  in  a  canonic  algorithm  cannot  be 
overwritten  (see  CA2),  it  is  clear  that  the  order  of  evaluation 
imposed  by  the  algorithm  is  only  important  because  it  guarantees 
that  each  variable  is  defined  before  it  is  used.  Clearly,  this 
property  is  preserved  in  the  element-wise  evaluation  of  the 
equivalent  •  system  of  sequence  equations  only  if  the  system  is 


causal. 


I 


i 
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3l.  Enforcing  the  causality  condition. 

Consider  the  sequence  equation 


pa(i),b(j)  "  r(ai,3  '  ^i, j ' * ‘ } 


('/) 


where  r  is  a  sequence  operator  and  a(i)  and  b(j)  are  functions  of 
i  and  j,  respectively.  The  more  general  form  of  (7)  may  involve 
n  dimensional  sequence  arrays.  However,  for  simplicity,  we  res¬ 
trict  our  discussion  to  the  case  n=2.  The  extension  to  higher 
dimensions  should  be  obvious. 


Definition  4:  The  causality  factor  0(t)  of  equation  (7)  at  any  t 

is  defined  as  the  minimum  integer  such  that  p  ...  .  , . . (t)  does 

not  depend  on  any  a.  .  (r),  >9.  .(7),...,  for  r>  t-0(t).  The 

1*3  1*3 

minimum  causality  factor  of  equation  (1)  is  defined  by 

=  min{0(t);  t^l} .  Clearly,  if  (7)  is  causal,  then  0m>O.  Q 


Any  data  item  in  (7)  may  be  associated  with  a  position  in  a 
3-dimensional  computation  space.  For  example  Pa(  ^^(t)  is 
associated  with  the  position  (t,a(i),b(j)).  Moreover,  if  0(t)  is 
the  deficiency  factor  of  (7)  at  t,  then  only  data  items  associ¬ 
ated  with  the  positions  (r,i,j),  r  =  l,...,t-0(t)  may  be  used  to 
define  p  ...  .  . . . (t) .  This  motivates  the  following  definitions: 

Definition  5:  The  dependence  pair  of  equation  (7)  at  any  t^l  is  a 
pair  of  vectors  <v,u>,  where  v  -  ( t ,  a(  i )  , b(  3 ) )  and  u 
(t— 0(t ) , i , j ) .  The  minimum  dependence  pair  of  (7)  is  the  pair 
<v,um>*  where  um  =  (t-0m,i,j).  □ 

Definition  6:  The  difference  vector  of  equation  (7)  at  any  t>>!  is 
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the  vector  v-u  =  (0(t) ,a(i)-i,b( j)-j) .  The  minimum  difference 
vector  of  (7)  is  the  vector  v-um  =  (0m»a( i)-i ,b( j )-j ) .  Q 

Note  that  any  non  linearity  in  the  difference  vector  along 
the  t  dimension  is  absorbed  in  the  minimum  difference  vector  by 
assuming  the  worst  case.  Note  also  that  the  first  component  of 
the  difference  vector  is  equal  to  the  deficiency  factor. 


If  equation  (7)  is  not  causal,  then  the  first  component  of 
the  minimum  difference  vector  is  not  positive.  However,  it  may 
be  possible  to  enforce  causality  by  the  application  of  some 
sequence  transformation  to  (7).  We  consider  two  types  of 
transformations.  Namely 

Sequence  spreading:  A  spread  of  equation  (7)  by  a  constant  s, 
a  VO,  is  a  substitution  of  each  sequence  a.  .  in  (7)  (here 

' —  3 

o  *  p, a, £,...)  by  another  sequence  a.  .  -  0  o.  .. 

1  >  J  1  •  j 

Sequence  skewing:  A  skew  of  equation  (7)  by  a  function  w(i,j)  is 

a  substitution  of  each  sequence  a.  .  in  (7)  ( a  -  p, cr, £,...)  by 

1  •  J 

another  sequence  a.  .  -  nw^'^  a.  .. 

1  ,  J  1  r  J 
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a . 


nw(i'3)  6s  «. 


>'i.3  "  05  'i.3 


_i-D 

then,  the  minimum  deficiency  factor  of  equation  (8)  is  given  by 


*m  =  (s+1)*m  +  w(a(i),b(j))  _  w(i,j).  (9) 

Proof :  By  the  definitions  of  the  operators  9  and  fl,  if 

a  *  nw(A'j)  0s  0  then  a.  .(t)  =  o.  . ( (s+1) t-s+w( i , j ) ) . 

•»  >  J  1  >  J  * '  J  1  >  J 

That  is  the  above  transformation  maps  the  position  (t,i,j)  in  the 
computation  space  into  the  position  ( (s+l)t-s+w( i , j ) ,  i,  j )  in 
the  same  space.  Hence,  the  minimum  dependence  pair  of  equation 
(8)  is  <v,  um> ,  where 


Y  =  ( (s+l)t-s+w(a( i) ,b( j ) ) ,  a ( i ) ,  b(j)) 
um  =  ( (s+1) (t-*m)-s+w( i, j) ,  i,  j) 

From  which  we  directly  find  that  the  first  component  of  the 

minimum  difference  vector,  and  thus  the  minimum  deficiency  factor 

are  given  by  (9).  □ 


For  the  special  case  of  linear  transformation,  we  may  prove 
the  following  result  by  direct  substitution  in  (9). 


Corollary:  In  Theorem  1,  let  a(  i )  =’i+a0  (  i )  and  b(  j  ) -j+bo  ( j  )  ,  and 
let  w(i,j)  -  c^i  +  Cyj  be  a  .linear  function,  then  the  minimum 
deficiency  factor  of  equation  (8)  is  given  by 

^  <3+1>*m  +  CJ  ao(i)  +  c2  bo(}>-  ° 

Now,  given  a  non  causal  system  of  n  canonic  sequence  equa¬ 
tions,  Jet  the  minimum  dependence  pair  of  the  kth  equation  in  the 
system  be: 

<(t,  i+ak(i),  j+bk(j))  ,  (t-0k,  i,  j)>,  k  -  l,...,n 


fk 


.  ( 

£ 


.  K 
£ 


l: 


O  , 
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where  not  all  k=l,...,n  are  positive.  In  order  to  transform 
the  given  system  into  a  causal  system,  we  first  attempt  to  find  a 
constant  s  and  a  linear  function  w(i,j)=c^  i  +  c?  j  such  that 


(s  +  1)]  >  O' 

c,  0 


an(i)  b.  ( j ) 

*2  a2 ( i )  bj(j) 


K  V1*  bn^)i  W 

where  the  relation  >  is  applied  element-wise.  If  this  is  possi¬ 
ble,  then,  we  have  found  a  linear  sequence  transformation  that 
will  transform  our  system  into  a  CCS.  On  the  other  hand,  if 
equation  (10)  does  not  have  a  solution,  then  we  should  seek  a  non 
linear  function  w(i,j)  such  that 

(s+l)0k  +  w(a(i) ,b( j) )  -  w(i,})  >  0  for  k  -  l,...,n  (11) 

In  many  cases,  there  may  be  more  than  one  constant  s  and  one 
function  w(i,j)  which  satisfy  (10)  or  (11).  In  such  cases,  we 
may  choose  s  and  w  to  minimize  the  execution  time  of  the  network. 

Definition  7:  Given  any  system  of  sequence  equations/ input-output 
specifications,  let  SQ  be  the  set  that  contains  the  positions  (in 
the  computation  space)  of  the  data  items  in  the  output  specifica¬ 
tion  part  of  the  system.  If  a  spread  by  s  followed  by  a  skew  by 
w(i,j)  transform  the  given  system  into  a  CCS,  then  each  position 
in  SQ  is  mapped  into  a  new  position.  Let  SQ  contain  these  new 
positions.  The  execution  time  Tg  of  the  systolic  computation 
corresponding  to  the  CCS  is  then  defined  by 


T  -  max {  t  ;  (t,i,j)  e  S  } 

-  max{  (s+l)t-s+w( i , j )  ;  (t,i,j)  e  So) 


Hence,  the  optimal  choice  of  s  and  w(i,j)  is  the  one  that  minim¬ 


izes  T  . 
e 

For  example,  consider  the  system  of  equations  (4).  The  com¬ 
putation  space  for  this  system  is  two  dimensional  and  the  minimum 
dependence  pairs  for  its  equations  are 


<(t  ,  j+1)  ,  (t  ,  j)>  for  j-1, ... ,m 

<(t  ,  j-1)  ,  (t-1  ,  j)>  for  j -1 , . . . ,m 

<(t  ,  j-1)  ,  (t-1  ,  j)>  for  j -m+1 

which  shows  that  the  system  is  not  causal.  Hence,  we  look  for  a 

constant  s  and  a  linear  function  w(j)  =  c^j  such  that 


(S 

and  T 

e 

minimum, 
Clearly, 
use  the 


max{ (s+1) (t+l)-s+c2j  ;  j-m,  t=l 
•which  in  this  case  means 
s=l  and  c2 -1  satisfy  the  above 
linear  transformation 


. . . ,n)  =  (s+l)n+mc2+l 
the  smallest  s  and 
conditions,  and  hence. 


is 


we 


e  t.  ;  Cj  =  rP  e  c  (13. a) 

V  nl  9  ”1  :  Vl  ‘  n"+1  6  flm+l  <13b> 

More  specifically,  if  we  multiply  both  sides  of  (4. a),  (4.b) 
and  (4.c)  by  D^  +  '*'  9,  n-^  ^0  and  0m9,  respectively,  and  use  pro¬ 
perty  PI  from  the  Appendix,  we  may  get 


3  +  1 

=  n 

9  [<j  +  VC3]' 

j  ),..., m 

0  4.  a) 

3-1 

-  n^-1 

o0  n  e  c.. 

j*  l,  . .  .  , m 

(14. b) 

-i  -■  1 

=  rP"1 

nn  n  e  [ [B. -«o/-y,] , 

j  ^m+1 

04. c) 

-j 
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Next,  we  replace  Cp  ^  in  (14.b/c)  by  0^  \  and  insert  fl  ^  fP 
in  the  same  equations,  and  finally  use  the  definitions  (13)  to 
obtain  the  CCS  (2)  that  was  introduced  in  Section  2.  In  general, 
it  is  safe  to  replace  a  don't  care  by  a  specific  value  for  the 
sake  of  simplifying  the  expressions.  However,  the  converse  is 
not  true.  In  other  words  we  are  not  allowed  to  replace  a 
specific  value,  which  may  be  defined  in  the  original  algorithm, 
by  a  don't  care.  This  is  why  we  could  not  simplify  (14.b/c)  by 
changing  fig  into  D. 

The  system  (6)  of  Section  4  provides  another  example  of  a 
non  causal  system.  Its  dependence  pairs  are  < (t , i) , (t-1, i) >  and 
< (t, i+1) , (t+1, i) >  and  the  output  set  Sq  =  { (m, i+1)  ;  i=l,...,n). 
For  this  system,  a  linear  transformation  with  3-0  and  w(i)-2i  is 
optimal.  Hence,  we  let 

°i  =  noa  ai’  °  = 

and  multiply  (6. a)  and  (6.b)  by  n g*  and  ng^i  +  ^,  respectively. 
Then  we  use  property  2  form  the  appendix  to  obtain  the  following 
CCS: 


INPUT{  -  t  ;  yi  =  Og1  y.  ;  $,  =  fig1  0. 

where  y  ^  and  /T  ,  i=l,...,n,  are  as  in  (5.a/b)  }; 


«i  -  n0  ^Tl,[0]  <  +  ,  5*) 

<i+i  -  no  “zI+iTcoi  (  ■ 

OUTPUT!  x.  -  C i+l(m+2i+2) ,  i-l,...,n  )  }. 


* 

6  ) 


i-  1,  .  .  . , n 

i -1 , . . . , n 


This  CCS  specifies  the  network  of  Figure  3  which  has  n 


computational  cells.  Each  cell  i  starts,  at  time  2i+l,  the  compu¬ 
tation  of  the  value  of  x^  in  an  internal  register.  The  content 
of  this  register  is  described  by  the  sequence  i ^  associated  with 
the  feed  back  link  x^.  After  m+1  time  units,  the  cell  terminates 
its  computation  and  the  computed  value  of  x^  is  passed  to  the 
following  cells  i+l,...,n,  on  the  output  link  y.  .. 


A  forward  substitution  network  with  n  computational  cells 


&.  Example  1:  h  multistage  shortest  path  network 

Consider  an  S  stage  graph  where  each  stage  s,  0«  s  ^  S  con¬ 
sists  of  ng  nodes,  with  n^  =  ng  =  1.  For  each  edge  directed  from 
a  node  j,  1  «  j  ^n  in  stage  s-1  to  a  node  i,  1  *  i  *  n  ,  in 

o  *“  L  3 

stage  s,  we  are  given  a  cost  a®  .  ,  and  the  problem  is  to  find  the 

1 »  j 

minimum  cost  of  a  path  from  the  initial  node  (node  1  in  stage  0) 
to  the  terminal  node  (node  1  in  stage  S). 

In  order  to  solve  the  problem,  we  let 

m  =  max{n  :  s=0,...,S},  and  we  assume  that  a3  .  -  <*»  if  there  is 

3  if] 

no  path  from  node  j  in  stage  s-1  to  node  i  in  stage  s,  or  if 
ns-l  <  j  -  m  and/or  ng  <  i  *  m.  That  is  if  either  of  the  two 
nodes  does  not  exist. 


In  the  following  algorithm,  the  solution  proceeds  by  finding 
at  each  stage  s  and  for  each  node  i  in  s  the  minimum  cost  C3  of  a 
path  from  the  initial  node  to  node  i.  Each  C3  is  computed  pro¬ 
gressively  in  y(i,j,s).  (We  denote  min{x,y}  by  x@y) . 

INPUT {  y( i ,m+l, 0)  -  0,  1-1,..., m  ; 

a(irj,s)  -  a3  ^  i,j«l,...,m  ,  }  ; 

DO  s-1, S 
DO  i-l,m 
DO  j-l,m 

{  IF  j-1  THEN  y(  i , j+1, s )  -  y( j ,m+l, s-1)  *  a(i,j,s); 

IF  j>l  THEN  y( i , j+1, s )  -  y(i,},s)  §  y(j ,m+l, s-1)  *  a(i,j,s)}; 
OUTPUT {  C*  -  y(l,m+l,S)  }. 
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Although  each  variable  in  the  above  algorithm  is  an  element 
of  a  three  dimensional  array,  the  algorithm  does  violate  CA1  of 
canonic  algorithms.  Namely,  y(j,m+l,s-l)  is  not  an  (i,j,s)^h 
element  of  an  array.  Hence,  we  let  y(j ,m+l, s-1)  =  x(i,j,s),  and 
use  the  expansion 

x ( 1 , j , s+1 )  -  y( j ,  m+1, s)  ;  x ( i+1 ,  j  ,  s )  =x(i,j,s) 

This  gives  the  following  algorithm: 

i NPUT {  x(l,j,l)  =0,  j=l,..., m  ; 

a(i,j,s)  =  a.  .,  i,j=l,...,m,  s=l,...,S  }  ; 

DO  s-l,S 
DO  i=l,m 
DO  j  =  l, m 

{  x( i+1, j , s)  -  x(i,j,s)  ; 

IF  j  =  1  THEN  y(i, j+l,s)  =  x(i,j,s)  +  a(i,j,s)  ? 

IF  1< j <m  THEN  y(i,;j+l,s)  =  y(i,j,s)  §  (x(i,j,s)  +  a(i,j,s))  ; 

IF  j  -  m  THEN  x(l,i,s+l)  ^  y(i,j,s)  @  (x(i,j,s)  +  a(i, j,s))}; 

OUTPUT {  -  x(l, 1,S+1)  }. 

This  algorithm,  however,  violates  condition  CA3  because,  for 
3  ’in,  the  index  i  is  used  to  select  an  element  of  the  x  array 
along  the  second  dimension,  which  is  associated  with  the  index 
j.  In  order  to  overcome  this  problem,  we  may  use  the  y  and  x 
arrays,  alternatively,  to  accumulate  the  partial  costs  at  succes¬ 
sive  stages.  More  specifically,  we  rewrite  the  algorithm  in  the 
following  canonic  form: 
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INPUT{  x ( 1 , j , 1 ) 

-  o. 

j -1, . . . ,m  ? 

a( i , j , a) 

a 

"  ai^' 

i . . . ,m, 

s^l,  3  f  5, . . . 

b( i , j , a) 

=  a8 

3'  i 

i  ,3=1,  •  •  •  firif 

s  “*2  f  4  f  6  f  •  •  • 

DO  s=lf S 


{  IP  (s  =  odd)  THEN 
DO  i=l,m 
DO  j=l,m 

{x(i+l, j,a)  -  x ( i , j  ,  a )  ; 

IF  j^l  THEN  y( i ,  j  +  1, a)  =  x(i,j,a)  +  a(i,j,s); 

IF  1< j <m  THEN  y(i,j+l,a)  =  y(i,j,s)  §  (x(i,j,a)  +  a(i,j,s)); 
IF  j=m  THEN  y(i,l,a+l)  =  y(i,j,s)  @  (x(i,j,s)  +  a(i,j,s))  }; 
IF  (a  =  even)  THEN 
DO  j=l,m 
DO  i=l, m 

{y(i, j+1,3)  =  y( i r  j  r a) ; 

IF  i  =  l  THEN  x( i+1, j , a)  =  y(i,j,a)  +  b(i,j,s); 

IF  l<i<m  THEN  x(i+l,j,a)  -  x(i,j,s)  §  (y(i,j,a)  +  b(i,j,s)); 
IF  i =m  THEN  x(l,j,s+l)  =  x(i,j,a)  §  (y(i,j,s)  +  b(i,j,s))  } 

}  ; 

OUTPUT (  C®  -  IP  S  ia  odd  THEN  y(l,l,S+l)  ELSE  x(l,l,S+l)  ). 

Now,  we  may  choae  both  i  and  a  to  repreaent  the  time  and 
compreaa  the  arrays  along  these  dimensions.  More  specifically,  we 
first  compress  the  arrays  along  the  i  dimension  by  defining  the 
sequences 

£®(i)  *  x(i,j,s)  j  =1, . . . ,m,  a  -1, . . . ,S 


. 


77_.(i)  -  y(i,j,s) 


fa( i, j ,3) 


3 

73(i)  ’*b(i, j,3) 
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j  ~1 1  •  •  •  r in* 

9  —  1 f  •  •  •  f  S 

jrl/  •  *  *  / nil 

s -  1,3/ • • • 

3  —  1 1  •  *  •  ,  m, 

9-2 , 4 , • » • 

9  S  9 

and  then  compress  the  sequences  £.,  77,  and  y.,  s=l, 

J  3  3 


the  s  dimension  by  defining  the  sequences 


(15. a) 
.,S  along 


{3  ■  Cl,s< 

*5 

) 

j— If • • • f m 

”3  •  p3«l,s< 

s 

”3 

) 

j  —  lf  •  •  •  fin 

’3  ‘  Cl.3< 

73 

) 

j— *lf  •  •  •  fin 

(15. b) 

g 

Note  that  the  elements  of  the  sequences  77^  for  s  =odd  are 
not  defined  by  the  canonic  algorithm.  These  elements,  however, 
are  not  used  in  the  algorithm,  and  hence  may  be  set  to  the  don't 
care  element  6.  With  this,  the  two  step  compression  leads  to  the 
following  canonic  system  of  of  sequence  equations: 


INPUT{  Yy  j=l,...,m,  as  given  by  (15)}  ; 


3  *  n0  (}  .  nj*y)  . 

^  —  If  •  •  •  t  m 

(16. a) 

3  +  i  ‘  V73  '  V 

j-L 

(16. b) 

3+1  ‘  M"'m(  V'W  '  ”3  ' 

j-2  f • * • f  m- 1 

(16. c) 

j-m+i  “  7  nm7?.@(nm^.  +  nmy.]  ) 

1! 

3 

(16. d) 

OUTPUT {  C®  =  IF  S  is  odd  THEN  v}(mS+l)  ELSE  ^(mS+l)  }. 

The  system  (16)  is  not  causal.  More  specifically,  its  equa¬ 
tions  have  the  following  dependence  pairs: 


<(  t  ,  j  ) 


,  (  t-1  ,  j  )> 


3=1, -  -  - ,m 


(17. a) 


•a 

£4 

— 

30  - 

ft 

:■> 

<( 

t  ,  j+1  ) 

,  ( 

t  , 

j  )> 

j-1 

(17. b) 

<( 

t  ,  j+1  ) 

.  ( 

t  , 

3  )> 

j  =2 , . . . ,m-l 

(17. c) 

JlL 

<( 

t  ,  j-m+1 

)  ,  ( 

t-m 

.  3  )> 

j=m 

(37. d) 

69 

In 

order  to 

enforce 

causality 

via  a  linear 

sequence 

.VJ 

transformation,  we  must  find  two  constants  p  and  c^ ,  such  that 


'l  0  fp+ll  >  fo 
0  1  c,  0 

0  1  *  1  0 

m  -m+ 1]  0 


and  Te  =  (p+1) (mS+1) -p+c2  is  minimum.  Clearly,  p-0  and  c?  -  1 
satisfy  the  above  conditions.  That  is  causality  may  be  enforced 
in  (16)  via  the  following  substitutions: 

;  t) ^  1) j  ;  y j  ~  ^  yj  •  j“lr  •  •  *  ,m  (38) 

More  specifically,  we  first  multiply  (16. a)  by  ,  (16.b/c) 
by  fp+1  and  (16.d)  by  n.  Then,  we  use  property  P2  from  the  Appen¬ 
dix  to  interchange  the  Cl  and  M  operators,  and  finally,  we  use 
(18)  to  obtain  the  following  CCS: 

INPUT{  V.  =  y.,  j- 1, . . . ,m  ) ; 

7^  -  nj  nQ  n_j  M^;j;,m"1(7J  ,  ^  ,  TjiC^+^l)  i~i - (i9.a) 

V p  -  n  M^'m(V1  ,  V})  3=3  (19. b) 

T7j  +  1  >■  n  M™j™(  7?^§[^+7j]  ,  7?j  )  3*2,  .  .  .  , m- 1  (19. c) 

V i  -  n  M^'m(  6*  ,  ^eiVj+T.])  j-m  (39. d) 

OUTPUT {  C®  =  IF  S  is  odd  THEN  T7i(mS  +  2)  ELSE  ^(mS+2)  }. 

The  above  CCS  describes  a  linear  network  of  m  cells  (see  Fig 
4),  where  each  cell  j  contains  an  accumulator  (call  it  )  whose 


1  -  •  ,  •  .  «  ■*.  -  -  A  .  *  . 

1  1  J  \  "  -  - 
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Fig  4  -  A  multistage,  shortest  path  network 


content  is  described  by  The  operation  of  each  cell  j  alter- 

nates  between  two  phases;  In  an  odd  phase,  is  stored  in  X.  and 

3+2. 

the  cell  contributes  in  the  computation  of  ,  i-l,...,m,  where 

s  +  2.  — 

each  C^  is  computed  progressively  on  the  y  links  by  picking  up 
contributions  from  the  different  cells.  In  the  next  phass,  the 


3+1 

computed  Ct  , 


i-l,...,m,  circulate  unchanged  on  the  y  links 

3  +  2 

while  cell  j  computes  in  its  accumulator.  The  precise 

operation  of  each  cell  is  given  by  (19). 


Note  that  the  term  ^  in  equation  (19. a)  indicates  that 
the  content  of  the  accumulator  at  cell  j  is  reset  to  zero  at  the 
j+l*"*1  cycle.  In  order  to  simplify  this  equation,  we  may  reset 
the  accumulator  to  zero  at  the  first  cycle  and  maintain  this  zero 
for  the  first  j+1  cycles.  This  is  equivalent  to  the  replacement 
of  (19. a)  in  the  CCS  (19)  by 


%  -  "j  •  > 


(19. e) 


A  network  very  similar  to  the  one  described  in  this  section 
is  given  in  ( 19 ] . 


,  „  V  v  V  \  -  V  V  V  "  *  V  v  "  '  V  V  '  *  *>  •  -  v-  ■  w  -j,  *  j.  ^  m 

■  '•  r  vV-W-.-.y-V-  -VV-' 
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2.  EXAMPLE  2:  A  network  tQX.  dynamic 


Consider  the  following  optimal  parenthesization  problem  [4]: 

Given  c.  .  ,  i=l, . . . ,nf  find  c.  ,  where 
id  1 ,  n 

ci,  i+j  =  min*  f(ci,i+k-l  '  ci+k, i+j*  ?  k  e  -  * • ' 3 >  >  <2°) 

for  j=l, . . . ,n-l,  i=l,...,n-j  and  a  given  function  f. 

The  order  at  which  the  minimum  is  evaluated  over  the  set  of 
indices  K  =  {l,...,j}  is  not  specified  by  the  problem.  The  sim¬ 
plest  order  is  the  sequential  order  k-l,...,j.  It  is  possible  to 
write  an  algorithm  using  thi3  order  and  then  apply  our  technique 
to  derive  an  equivalent  systolic  computation.  The  resulting  net¬ 
work,  however,  does  not  overlap  the  computation  of  c .  . , .  for 

iii+] 

2 

different  j,  and  hence  has  an  execution  time  T  *  0(n  ). 

An  alternative  order  for  the  evaluation  of  (20)  is  to  start 
from  the  middle  of  the  interval  [l,j],  namely  from  t  -=  (j  +  1)  r  2, 
and  proceed  towards  the  boundaries  of  [l,j]  in  the  two  directions 
i-k  and  i+k,  k=l,2,...,  simultaneously.  This  is  described,  more 
precisely,  by  the  algorithm  shown  in  Fig  5,  where  x@y  denotes 
min{x,y),  and  x+y  denotes  the  quotient  of  x/y. 

The  algorithm  of  Fig  5  is  not  canonic.  However,  it  may  be 
rewritten  in  the  canonic  form  shown  in  Fig  6  by  using  the  follow¬ 
ing  substitutions  along  with  the  appropriate  expansions: 


m( i , j , k)  -  h  ... 
J  1,1+3 


7.  (  i  ,  j  ,  k )  -  c  .  ..... 

J  i,i+(3+ 


j+2) +k-l 


;  w( i , j , k)  -  ci+(ji 


( j  +  2)+k, i+j 


INPUT {  c.  . 

*  F  *■ 

DO  j=l,n-l 
DO  i=l,n-j 


i=lr ...»n  } 


DO  k-1,  £  +1  /*  £  is  the  quotient  of  (j  +  l)-r2  */ 

{IF  (j  -  odd)  THEN 

{  IF  k  -  1  THEN  h**£+j  -  f(c  i'i+t-i  '  ci  +  £ ,  i+j  ^  ; 

IF  l<k<£  THEN  h*+J+j  -  hi,i+j  @  f(ci,i+£-k  '  Ci+£-k+l,i+j) 

@  f(ci,i+£+k-2  '  ci+£+k-l, i+j  ^  ; 
IF  k=£+i  THEN  c.  i+.  -  h^i+j 

}  ; 

IF  (j  »  even)  THEN 

{  IF  k  -  1  THEN  h*+J+.  =  Uci  i+l_k  .  ci  +  A_k+lfi+j) 

*  f(ciri+£+k-l  '  ci+£+k,i+j)  ? 

IF  l<k*£  THEN  h  *^+.  =  h*i+j  §  f(c ifi+l-k  '  ci+£-k+l , i+ j } 

e  f(ci,i+£+k-l  '  ci+£+k,i+j)  ; 

IF  k-£+l  THEN  cii+.  =  ^i, i+j 

}  : 

)  t 

OUTPUT {  c,  =  h^+^'  where  L  =  n  t  2  ) 
i,  n  i»n 

Fig.  5  -  An  algorithm  for  dynamic  programming. 


i 


INPUT{  z ( i , 1, 1) 
DO  j-l,n-l 
DO  i=l,n-j 


,  y( i , 1 , 1)  =  c 


i+1, i+1  ' 


i  If • • • >n  1  }; 


=  c  . 

if  i 


DO  k=l ,  i  +1 

{IP  (j  =  odd)  THEN 

{  IF  k  =  1  THEN  {  x(i,j+l,k)  =  z(i,j,k)  ;  w(i-l,j+l,k)  =  y(i,j,k)  ; 
m( i , j , k+1)  =  f (z(i, j ,k)  ,  y(i,j,k))  ); 

IP  l<k*l  THEN  {  x(i,j+l,k)  =  x(i,j,k)  ;  w(i-l,j+lfk)  =  w(i,j,k)  ; 
z  ( i , j+1, k-1)  -  z(ifjfk)  ;  y( i-1, j+1, k-1)  »y(i,j,k)  ; 
m( i , j , k+1)  =>  m(i,jfk)  @  f  (x(  i ,  j  ,k) ,  y(  i ,  j  ,k) ) 

§  f (z( if j ,k) ,w( i, j ,k) )  } ; 

IF  k=i+l  THEN  {  z(i, j+l,k-l)=m(i, j,k)  ;  y(  i-1,  j  +  l,k-l) -*m(  i  ,  j  ,  k) } ; 

}  ; 

IF  (j  -  even)  THEN 

{  IF  k  =  1  THEN  {  x(i,j+l,k+l)=x(i,j,k)  ;  w( i-1 r j  +  1 , k+1) -w( i , j , k)  ; 
z ( i , j+1, k)  =  z ( i , j  ,  k)  ;  y(i-l,j+l,k)  -*  y(i,jrk)  ; 
m( i , j , k+1)  -=  f (x(ir j rk) ,y(i, j fk) )  §  f (z( i , j , k) , w( i , j , k) )  }; 

IF  l<k*i  THEN  { 

x( i, j+1, k+1) -x(i, j ,k)  ;  w( i-1 , j+1, k+1) =w( i f j , k)  ; 

z(i, j+l,k)  =  z ( i , j , k)  ;  y(i-l,j+l,k)  -  y(i,j,k)  ; 
m(  i , j ,  k+1)  =■  m(i,j,k)  @  f (x( i , j ,k) ,y( i , j ,k) ) 

@  f (z(i, j ,k) ,w( i, j ,k) ) ) ; 

IF  k-i+1  THEN  {  z ( i , j+1 , k)  =  m(i,j,k)  ;  y(i-l,j+lrk)  -  m(i,j,k)}; 
}  ? 

)  ; 

OUTPUT {  c,  -  m( 1 , n-1 ,L+1) ,  where  L  -  n+2  }. 
x  r  n 


Fig  6  -  A  canonic  algorithm  for  dynamic  programing , 


x(i, j,k) 


y(i- j.k) 


-  c  , 


=  c  . 


i ,  i+i-k  '  ‘•j+i-jc+i,  i+j 

Next,  we  chose  k  to  represent  the  time  and  we  define  the 

sequences  fi .  7j .  ■,  C-  and  <j.  .  to  contain  the  elements 

1*3  1  *  3  i*3  1  *  3  1*3 

of  the  arrays  m,  x,  y,  z  and  w,  respectively,  along  the  k  dimen¬ 
sion.  We  also  define  the  element-wise  sequence  operator  <p  such 
that  [0({,7?)](t)  =  f(f(t)  ,  7j(t)).  With  this,  we  may  compact  the 


canonic  algorithm  along  the  k*'*'  dimension  and  obtain  the  follow¬ 


ing  canonic  system  of  sequence  equations: 


INPUT{  «lfl(l)-ciri  ,  1ifl<U  -  ci+lfi+1  i-1 . n~l ' 


*  i,  l^'^i,  l(t)=0'  fc>1,  -  *n-l} ; 

FOR  j=l,...,n-l  and  i=l,...,n-j 


(7.1) 


n  m 


l,i-i,« 


(*(Ci, j,7,i, j)  '  Mi,j@iri,j  '  6  1  if  j  is  odd 


nMl'i"1'“(^i,j  '  *i,j@'i,j  ' 


if  j  is  even 


(72) 


where  t=(j+l)*2  and  1*.  i  =  0(«.  .,7?.  ) 

A  *  J  •L*3  A  *  J  ■L*j  1  *  J 


€i, j+1 


•j+1 


Ci ,  j+1 


i-1, j+1 


,  lit)  ,  a*) 

if 

j 

is 

odd 

‘‘i.l 

if 

j 

is 

even 

1  1  —  1  GO  * 

(M  (,1, j  •  "i,J  '  8  > 

if 

j 

is 

odd 

ltd .  . 

1*3 

if 

j 

is 

even 

rMA"1,1'"(n"1ci  .  ,  n~1n. 

i  *  3  i  *  3 

* 

r  8  ) 

if 

j 

is 

odd 

Ml,1'~«i,j  '  *i,3  -  a‘> 

if 

j 

is 

even 

m1-1 ' 1 ,0°(n-1^ j. t j  -  n 

* 

r  8  ) 

if 

j 

is 

odd 

Ml,1'"(,»i.j  '  *1.)  ’ 

if 

j 

is 

even 

_  -  fi,  ,(!.+!),  where  L.  * 

nr  2  ); 

(23) 


(24) 


(25) 


(26) 


V 


The  dependence  pairs  for  equations  (22),  (23)  and  (24)  are, 


respectively, 


<(t  ,  i  ,  j)  ,  (t-1  ,  i  ,  j)>  3=1,2, 
<(t  ,  i  ,  j  +  1)  ,  (t  ,  i  ,'j)>  j=l,2, 
<(t  ,  i-1  ,  j+1)  ,  (t  ,  i  ,  j)>  j-1,2. 

The  dependence  pairs  for  equations  (25)  are 


<(t  ,  i  ,  j+l)>  ,  (t+1  ,  i  ,  j)>  3=1,3, 

<(t  ,  i  ,  j+l)>  ,  (t  ,  i  ,  j)>  3=2,4, 

and,  the  dependence  pairs  of  equations  (26)  are 


<(t  ,  i-1  ,  j+l)>  ,  (t+1  ,  i  ,  j)>  j*l,3, 

<(t  ,  i-1  ,  j+l)>  ,  (t  ,  i  ,  j)>  j-2,4, 


(27. a) 
(27. b) 


(28. a) 
(28. b) 


As  indicated  by  the  dependence  pairs,  all  the  equations  in 

the  system  are  not  causal.  However,  by  applying  Corollary  1  of 

Section  5,  we  may  check  that  a  linear  skew  of  the  equations  with 
2i 

fl  J  transforms  the  system  into  a  causal  one  that  ha3  an  execution 

time  T  =  2n+L-l,  where  L^n+2. 
e 

An  interesting  remark  is  that,  in  the  absence  of  the  pairs 
(27. a)  and  (28. a),  a  skew  of  the  form  n~^  is  sufficient  to  enforce 
causality.  In  other  words,  the  factor  of  two  is  only  needed  for 
the  case  'j-odd'.  For  this  reason  we  may  try  to  apply  a  non 
J  inear  skew  of  the  form  where 


q ( 3  )  -*  3jf2  -  1 


q.  (3)  =  (33-1)t2  -  1  .  -  .  .  ,  , 

M1VJ'  v  J  if  j  is  odd 


- 

|q,(3)  - 


3 j^-2  -  1 


if  j  is  even 


The  application  of  Theorem  1  indicates  that  a  3kew  of  the 
system  (22)-(26)  with  q(j)  will  enforce  causality.  For  example, 
the  pairs  (28.a/b)  are  mapped  to 


"»  V*  •***  *»*" 


; .*  - / v;>; >  * .•  v ; .* 


-v  y-yy-y-yy-  .--v.  v- v- 


J* 


<(t+q_(j+l)  ,  i-1  ,  j+1)  ,  (t+l+q..  ( j )  ,  i  ,  j)>  j*l,3,... 

<(t+q1(j+l)  ,  i-1  ,  j+1)  ,  (t+q2(j)  ,  i  ,  j)>  j=2,4,... 

from  which  we  find  that  the  minimum  deficiency  factors  of  equa¬ 
tions  (26),  after  transf ormation ,  are  q^( j+l)-q^( j )-l-l,  and 
q^(j+l)  -q2(j)-l,  for  j=odd  and  j=even,  respectively.  Similarly, 
we  can  show  that  the  minimum  deficiency  factors  of  the  other 
equations  are  all  equal  to  unity.  The  execution  time  of  the 
transformed  system  is  given  by  Tg  =  L+l+q(n-l)  =  2n-2. 

Hence,  a  substitution  of  the  form 

o..=nq^o  a  =  ii  ,  i  ,  u  ,  C  or  v 

1  ’  J  1 '  J 

in  the  system  (22) -(26)  gives  the  following  CCS: 


INPUT{  C  i  =  C  -j  r  v  ,  -  T)  .  .  ,  i-1,  .  .  .  , n-1, 

1,1  1,1  1,1  1,1 

where  C  •>  and  t\  .  .  are  as  in  (21)  } ; 

1,1  If  1 


FOR  j=l,...,n-l  and  i=l,...,n-j 


°  '  •i.jW’t.j  ■  8*>  if  3  is  odd 


1 ' ^  M1:J.^:"(^.  .  ,  n.  .§*.  .  ,  s*) 

q2(j)+lvrl,j  ^i,jcri,j 

where  i  =  (j  +  l)-r2  and  f  =  <p(.i  .)@0(C.  ,,gj.  .) 

1  f  J  A  f  J  1  r  J  1  ’  J  1  r  J 


if  j  is  even 


i  ,  j+1 


2  1  f  - 1  co  -  -  —  * 

n  M  w  •  W,  U  •  •  #  z  •  .  0  ) 
ql(3)+lvvi,j  ' 


O  l  . 
1  f  J 


if  j  is  odd 
jf  j  is  even 


w 


i-l»  j  +  1 


2  1  i-1  CO  —  —  * 

n  m  ; , .  r'  (1? .  .  ,  w .  .  ,  o  ) 
ql( j)+lv  ’  i, j  l,) 


n  u . 

1  <  j 


if  j  is  odd 
if  j  is  even 


i,  j  +  1 


f  -  1  1  CO  "  --  * 

"  ■  *lwi  -  °  ) 

•  “i.j  -  »*> 


if  j  is  odd 
if  j  is  even 


ql( 3 ) +2  i , j  '  i,  j  '  w  ' 
^i-l, j+1  "j  Mi,l,“  “  a*) 

ln  Mq2( j )+l  ^i, j  '  *i,j  '  0  > 
OUTPUT {  cln  =  Mln_1(2n-2)  }. 


if  j  is  odd 
if  j  is  even 


The  above  CCS  specifies  the  network  shown  in  Pig  7. a,  which 
was  first  introduced  by  Rung  and  Guibas  in  [4].  The  structure  of 
each  cell  may  be  directly  derived  from  the  operators  in  the 
causal  equations.  As  an  example,  we  show  in  Fig  7.b  the  internal 
details  of  a  cell  (i,j),  j=even.  Note  that  the  circuits  for  the 
outputs  on  zi  and  are  similar  to  those  for  y^_^  j+j/ 
and  x.  .  respectively,  and,  hence,  are  not  shown  in  the  fig- 

1  f  1  »  i. 


Fig  7. a  -  A  systolic  network  for  linear  programming 


remarks 


&.  Concluding 

The  sequence  model  introduced  in  [14]  for  the  verification 
of  systolic  computations  is  applied  in  this  paper  to  the  sys¬ 
tematic  design  of  such  computations.  Given  an  algorithm  for  the 
solution  of  a  specific  problem,  the  first  step  in  the  design 
technique  is  the  transformation  of  the  algorithm  into  a  canonic 
form.  Then,  the  algorithm  is  rewritten  as  a  system  of  sequence 
equations  and  finally,  a  sequence  transformation  is  used  to 
enforce  causality  and  produce  a  complete  specification  of  a  net¬ 
work  which  executes  the  original  algorithm. 


The  technique  is  applicable  to  self -timed  computations  as 
well  as  systolic  computations.  More  specifically,  it  was  shown 
in  [13]  that  self  timed  networks  may  be  specified  by  systems  of 
weakly  causal  equations,  where  the  minimum  deficiency  factor 
of  each  equation  is  non-negative  rather  than  positive.  Hence,  in 
the  last  step  of  our  technique,  a  sequence  transformation  that 
enforces  only  weak  causality  should  produce  the  specification  of 
a  s elf -timed  computation. 


The  order  of  associating  operands  to  operator  in  the  origi¬ 
nal  algorithm  is  crucial  and  may  lead  to  different  systolic  com¬ 
putations  that  solve  the  same  problem.  For  example,  in  the 
dynamic  programming  algorithm  of  Section  7,  different  networks 
may  be  obtained  by  considering  diff'  nt  orders  for  the  evalua¬ 
tion  of  equ  (20)  over  the  set  of  indices  K 1 ,  .  .  .  ,  j } .  Also,  in 

ALG1  of  Section  2,  if  the  order  of  the  summation  is  reversed, 

m 

that  is  x,  V  a.  .  ,  ,  x.  .  ,  is  replaced  by  x .  - 

l  i,  i+j-m-1  x+3-m-l  1  \ 


L  ai,i+j-ra-l  xi+j-m-l  '  ^ai,i-k  Xi-k'  then  the  resultin9  al9°- 
rithm  does  not  have  any  systolic  realization.  In  order  to  over¬ 
come  this  problem,  it  is  essential  to  find  a  suitable  notation  to 
express  generic  algorithms,  namely  algorithms  in  which  the  orders 
of  evaluation  of  the  operations  are  not  specified,  and  then  to 
introduce  a  design  technique  which  derives  the  order  that  leads 
to  the  optimal  design. 

Finally,  we  should  note  that  the  sequence  transformations 

used  in  this  paper  are  time  independent.  More  specifically,  we 

used  transformations  of  the  form  a.  .  -  nw^i'^03a.  .,  where  s  is 

i  »  J  i  f  ] 

a  constant.  A  more  general  transformation  may  be  obtained  by 
assuming  that  s=s(t)  is  a  function  of  time,  that  is  the  elements 
of  the  sequences  are  spread  non  uniformly.  However,  we  did  not 
find  any  example  where  this  time  dependent  spreading  i3  useful 
and  hence  we  have  chosen  to  simplify  our  notation  by  keeping  s 
constant . 


I  would  like  to  thank  Concettina  Guerra  for  the  long  discus¬ 
sions  that  stimulated  this  research.  The  use  of  the  multistage 
shortest  path  network  as  an  example  was  also  her  idea. 
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In  this  appendix,  we  define  the  sequence  operators  that  are 
used  in  the  paper  and  we  introduce  some  of  their  properties.  Let 
R0  be  the  set  of  all  sequences  defined  on  R  U  {0},  where  R  is 
the  set  of  real  numbers,  and  6  is  a  special  element  called  the 
don't  care  element. 


1)  0-regular,  element-wise  operators:  Any  binary  operator  'op' 
defined  on  R  may  be  extended  to  RQ  by  applying  it,  element  wise 
to  elements  of  sequences,  with  6  being  the  result  of  any  opera¬ 


tion  involving  0.  More  specifically. 


[{  'op'  7?](t)  -  i  €  ( t )  'op  7?(t) 


if  £(t)~6  or 
otherwise 


2)  The  shift  operator;  0*  :  R_  -*  R_,  is  defined  by 


rx  if  t  4  r 

[IT  C](t)  -  i4(t-r)  if  t  >  r 

More  descriptively,  if  r  is  positive,  then  n*  inserts  r  elements, 
each  equal  to  x,  at  the  beginning  of  its  operand.  For  example, 


if  t  >  r 


C  =  zl  ,  z2  ,  z3  ,  . 
2 

then,  =  x , x , z^, Z£ , -  * • • 


Note  that  flx  may  be  used  to  model  a 


cell  which  maintains  x  on  its  output  for  r  time  units,  and  delays 
its  input  by  r  units.  For  simplicity,  we  omit  r  if  it  is  unity, 
and  x  if  it  is  6. 

On  the  other  hand,  if  r  is  negative,  then  n*  trims  the  first 

_o 

r  elements  of  its  operand.  For  example,  with  C  of  (29),  n  C 


n 


Note  that  n  may  be  used  as  an  inverse  to  fi  .  More 
specifically,  fl  r  fl*  C  =  C.  However,  the  converse  is  not  always 
true,  that  is  n*  fl  r  C  =  C  only  if  the  first  r  elements  of  C  are 
equal  to  x. 

3)  The  spread  operator;  0s  :  R.  -•  R  is  defined  as  follows: 


[0  <](t) 


C  ( (t+s)-f(s+l) )  t~l, s+2, 2s+3 i-l)s+i,  .. . 

0  otherwise 


In  other  words,  0  inserts  s  don't  care  elements  between  Succes¬ 
sive  elements  of  its  operand.  With  C  of  (29),  we  have 

2 

0  C  —  Zj , 6 , fl ,  z ^ , d , 0 ,  Z ^ , • • • * 

4)  The  Multiplexing  operator:  M^^"fWn  ;  [RQ]n  -*  RQ,  is  defined 
to  model  a  multiplexer  that  has  n  inputs.  It  starts  operation  at 
time  r  (1  if  r  is  omitted),  and,  periodically,  samples  its  inputs 
with  the  ratio  w^:...:wn.  The  output  for  the  first  r-1  time 
units  is  set  to  x  (0  if  x  is  omitted).  More  specifically,  if 


K=W2+...+wn  is  the  multiplexing  period,  then 


^ixi!Wn  ({1  .  £n>!<t)  -  {«  ft) 

6 


if  t<r 
if  t:>r 


where  e  is  the  largest  integer  between  1  and  n  such  that  the 
remainder  of  (t-r)+K  is  less  than  w.+...+w  .  For  example,  with  C 
as  in  (29),  and 


V  "*  yl  '  y2  '  y3  '  *  '  '  ( -5°) 

2  1 

we  have  M2^[x]^,7?^  J  x ,  z2 '  z3  '*4 '  z5  '  z6  ,y7 '  *  *  *  *  Note  that  if  wn="' 
wl  wn 

then  M  ' **'  may  be  used  to  model  an  n-phase  cell,  where  each 

phase  e-*l,  .  .  .  ,n-l  executes  for  w  time  units,  and  the  last  phase 

o 

executes  from  time  r+w,+...+w  .  until  infinity. 

1  n- 1 


concatenates 


5)  The  piping  operator:  Pn(£]/***£n)  :  [R0J  "*  Re» 
the  first  k  elements  of  its  operands  into  one  long 

sequence.  Por  example  if  £  and  77  are  as  in  (29)  and  (30), 
respectively,  then 

1*2^  =  ^ ^2 '  ^3  ' ^1 ' ^2  ' ^3  '  ® ® '  *  '  * 

]r 

For  simplicity,  we  write  pn(  *  j/  *  *  * »  €n)  as  pe=l,n^e^‘ 


The  following  properties,  may  be  directly  verified  from  the 
definitions  of  the  sequence  operators: 

Property  Pi:  0s  flx  C  *  n x  0s  C 

Property  P2: 


0r  wl, . . ,wn 
nx  M1  Ul' 


"<n>  ^Mrii;[;r(nx  <1 . nx  <n> 


Property  P3:  If  ’op*  is  an  element-wise  operation,  then 


n*  [«  ’op*  77]  =  'op'  nS 

0  [C  'op'  77]  *  0  <  'op'  9B77 


•C. 

•  * 
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